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d Abstract. We investigate a variety of statistical properties associated with the 

C/5 ' number of distinct degrees that exist in a typical network for various classes of networks. 

For a single realization of a network with TV nodes that is drawn from an ensemble in 
which the number of nodes of degree k has an algebraic tail, Nk ^ N /k'^ for fc ^ 1, the 
number of distinct degrees grows as N^'^ . Such an algebraic growth is also observed in 
'^ \ scientific citation data. We also determine the N dependence of statistical quantities 

5 ' associated with the sparse, large-fc range of the degree distribution, such as the location 

of the first hole (where TV^ = 0), the last doublet (two consecutive occupied degrees), 
triplet, dimer [Nk = 2), trimcr, etc. 



PACS numbers: 89.75.Fb, 02.50. Cw, 05.40.-a 
1. Introduction 



o 

en ' A complete microscopic representation of a macroscopic system is usually unavailable 

and often unnecessary, especially if the system is evolving or it is taken from an 
ensemble and the goal is to understand the typical features of the ensemble. Thus 
r^ • instead of determining a huge number of parameters (such as the 10^^ coordinates and 

_c^_. momenta of atoms), it often suffices to know a few useful macroscopic quantities (like 

the total number of atoms and the total energy) to understand the bulk properties of a 
macroscopic system. 

In the realm of networks, one usually starts with an ensemble of large networks that 
are generated according to a specified and not completely deterministic algorithm. In 
analogy with other bulk systems, we are typically interested in macroscopic-like network 
characteristics, such as the total number of links, the total number of triangles, the total 
number of clusters (maximal connected components), etc. [1]. Two of the most useful 
macroscopic characteristics are the cluster-size distribution and the degree distribution. 
The degree of a node (the number links attached to the node) is perhaps the 
simplest local network characteristic. It has been now been extensively studied, with an 
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emphasize on networks with broadly distributed degrees [2] . Here we analyze the number 
of distinct degrees Djv that exists for a given network of size A^. The number Dj^ varies 
from realization to realization, but for the ensembles that we study D^ turns out to be 
a self- averaging quantity, so that its mean value is the most important characteristic. 
We focus on {D^) which we generally write as D^ when no ambiguity is possible. 

We also investigate the locations of the first hole (the smallest k where N^ equals 
zero), the last doublet (the largest k value for which N^ > and A'^^+i > 0), last triplet, 
the last dimer (the largest k value where Nk = 2), trimer, etc. in the degree distribution 
(Fig. [ID . 




Figure 1. A network of 16 nodes, with node degrees indicated. In ttiis example, 
Nk — {7,4,1,0,2,1,0,1}. The number of distinct degrees Dig = 6, the last doublet 
occurs at fc = 5, the last dimer also at fc = 5, the first hole at fc = 4, and fcmax = 8. 



The number of distinct degrees D^ exhibits interesting behavior for network 
ensembles in which the degree distribution has an algebraic tail; hence we focus on 
such networks. For concreteness, we consider networks that are grown by preferential 
attachment. The best-known case is strictly linear preferential attachment [31 HI El El [71 
[8], in which a new node attaches to a pre-existing node of degree k with rate A^ = k. 
To illustrate the quantities studied here, we plot the degree distribution for a realization 
of such a network of A^ = 10^ nodes (Fig. [2]). For small k, every degree is represented, 
that is, Nk > 0. As k increases, eventually a point is reached where Nk first equals 
zero; this defines the first "hole" in the degree distribution. Holes become progressively 
more common for larger k and eventually the distribution becomes sparse. Figure [2] 
also indicates the position of the last doublet, the largest k for which A^^ > for two 
consecutive k values, while the last dimer is defined as the largest k value for which 
Nk = 2. One can analagously define the last triplet and last trimer, etc. As k continues 
to increase, the degree distribution is non-zero at progressively more isolated k values 
and eventually the distribution terminates when largest network degree fcmax is reached. 

One of our principal results is that 



D^^r(i-i)(i?A^)V^ (1) 
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Figure 2. Number of nodes of degree fc > 10 for a single network realization of 
N — 10^ nodes that is grown by strictly linear preferential attachment. The largest 
degree is fcmax — 6693, Dn = 465, the last doublet occurs at k — 782, the last dimer 
at fc = 641, the last triplet at fc = 518, the last trimer at fc = 500, and the first hole at 
fc = 201 (arrows). 



for networks whose degree distribution has the algebraic tail 

Nk ~ NRk-'' when A; > 1, (2) 

where i? is a constant of the order of 1. 

The behavior of Dn parallels that of Heap's law of linguistics [9l ^U\, in which the 
number of distinct words in a large corpus of A^ words grows sub-linearly with N. Recent 
work [m [121 [13] has related the N dependence in Heap's law to the dependence of word 
frequency versus rank in this same corpus — Zipf's law [Tl]. Because of the simplicity 
and explicitness of scale-free network models, we can quantify the statistical properties 
of Dn more precisely than in word-frequency statistics. It is also worth noting that 
the number of distinct degrees in a particular realization of a network is reminiscent of 
the "graphicality" of a network. Namely, given a set of disconnected nodes, each with a 
specified degree, one can ask which degree sequences allow all the nodes to be connected 
into a single component without multiple links between the same nodes [151 [161 [17] . The 
number of distinct degrees provides complementary information oabout which degree 
sequences are actually realized in a complex network. 

2. Distinct Degrees 

Consider networks whose degree distribution has the asymptotic power-law form of 
Eq. (|2]). We deal only with sparse networks, for which z/ > 2. A network with such a 
degree distribution can be easily constructed by the redirection algorithm [18] , in which 
a new node either attaches to a random-selected "target" node with probability r or 
to the ancestor of the target with probability r. This algorithm generates a scale- free 
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network whose growth rule is precisely shifted linear preferential attachment, with the 
attachment rate to a node of degree k, A^. = k + X, with A = - — 2. This growth rule 
leads to a degree distribution that has the form ([2]) with exponent u = 1 + ^- We use 
this redirection algorithm for our simulations and interchangeably refer to the growth 
mechanism as either shifted linear preferential attachment or redirection. 




Figure 3. Average number of distinct degrees Djv versus N for networks that are 
grown by redirection with redirection probability r. The upper curve (o) corresponds 
to Ak = fc — i or to redirection probability r = |. Here the degree distribution 
exponent is 5/2 and Dn = BN^^^, with B = {3/2f/^Tr-^^^T{3/5) = 1.393019.... 
The lower curve (A) corresponds to Ak = k oi r — ^. Here Dn is given by ([S|). Each 
data point represents an average over 10** realizations. The dashed lines correspond to 
the theoretical prediction ([T|). 



To determine the number of distinct degrees that appear in a typical realization of 
a large network, first notice that for k in the range k < K = (NRY^''', Nk > 1. In this 
dense regime of the degree distribution (Fig. |2]), all degrees with k < K are present. 
This range therefore gives a contribution of (NRY^''' to Dj\f. In the complementary 
sparse range oi k > K, we estimate the number of distinct degrees, by integrating the 
degree distribution for k > K. Adding the contributions from the dense and sparse 
regimes gives 

K. K={NRf'''. (3) 



/-) naive 



V 



v-\ 



While the A^-dependence is correct, Djv ~ N^^^ ^ the amplitude is wrong. A better 
estimate can be obtained by assuming that the probability distribution for the number 
of nodes of each degree k is the Poisson distribution with average value N^ given by 
([2]). Then P^ = Prob[(# nodes of degree fc) > 1] = 1 — exp{—Nk). Using this property 
leads to a more accurate estimate 



D 



N 



k>l 



— e 



-Nk 



(4) 
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Replacing the sum by an integral, we ultimately obtain ([T]). For strictly linear 
preferential attachment, R = 4 and z/ = 3, so that 



Dn = BN^'^ B = 2^/^! (I) = 2.149528 . . . 



(5) 



In contrast, the naive estimate (I3l) for the amplitude is B^ 
which exceeds the more accurate value by ^ 11%. 



3-2-1/3 = 2.381101..., 



Generally Dn/D 



naive 

N 



r (2 — i), so for the admissible range of 2 < z/ < oo, this 



ratio monotonically increases from ^ 



TC 



0.886227 to 1. As shown in Fig.|3l simulation 



results are in excellent agreement with our theoretical predictions. A more detailed 
asymptotic analysis indicates that the average number of distinct degrees admits the 
expansion, Dj\f = BN^^^ + C + . . . for strictly linear preferential attachment. This allows 
us to extract a precise estimate of B from the data that is in excellent agreement with 
Eq. ©. 
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Figure 4. The maxinium degree (number of citations) and tlie number of distinct 
degrees for the Physical Review citation network during the period 1893-2003. Also 
shown are the locations of the last dimer and the first hole. The dashed lines are 
the power-law fits with respective exponents 0.849. 0.60. 474, and 0.430. The data are 
measured at 20 equally-space network sizes as discussed in the text. 



The general behavior outlined above for the number of distinct degrees and related 
quantities is also observed in the citation network of the Physical Review. Because this 
journal has grown roughly exponentially with time [201 121], it is not appropriate to use 
publication date as a proxy for the network size. Since the citation data is presented 
as a list of links, each in the form of citing paper — )■ cited paper, it is more natural to 
use the chronologically-ordered number of links as the proxy for network size. We use 
the Physical Review citation data as of 2003, which contains L = 3, 110, 866 total links 
(citations). The maximum network degree (the highest-cited paper), the location of the 
last dimer, the number of distinct degrees, and the location of the first hole dimer are 
measured when the network size is ^L, with m = 1, 2, . . . , 20 (Fig. H)). 
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6 



Naive power-law fits to the first three datasets in Fig. H] give /Cmax ~ L°-^^^, 
Dl ~ L°'^^^, and (hi) ~ L^-^"^^. Let us provisionally assume that the citation distribution 
has a power-law dependence on L and, by implication, the same dependence on ATj. 
Using /cmax ~ N^/i'^-'^) and the dependences for the number of distinct degrees and 
location of the last dimer given in Eqs. (jS]) and ( ITSll . we infer the respective exponents 
for the degree-distribution exponent values of 2.18, 2.09, and 2.11. Thus these three 
properties are internally consistent under the assumption the citation distribution has 
a power-law form with exponent in the range 2.1-2.2. 
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Figure 5. Scaled distribution of distinct degrees f{z) for N up to 10^, with 10'' 
network realizations for each N, for redirection probability ^ (corresponding to strictly 
linear preferential attachment). The smooth curve is a Gaussian fit to the data. 
Visually identical data occurs for other redirection probabilities. 



Our simulation results indicate that the random quantity D^ is self-averaging. For 
strictly linear preferential attachment, we find that the standard deviation grows as 



degrees fits the Gaussian 



Moreover, the probability distribution ^{{D]^) of distinct 



Ii{D 



N) 



V2na^ 



'{Dn-{Dm)?/2ct^ 



(6) 



extremely well (Fig. |5]). In appropriately scaled coordinates, this form universally 
holds for any redirection probability (equivalently different A values in the attachment 
rate Ak = k + X). Moreover, the scaled distributions f{z) = v2no^Ii(DN), with 
z = a/((-D|^) — (-DAr)^)/2cr2, are virtually identical for different A values. 

I While a power-law gives a reasonable visual fit to the data, later and larger-scale analyses [201 ESI 
[23l EH ES] suggest that the citation distribution has a log-normal or stretched exponential behavior, 
rather than a power-law form. 
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3. The First Hole 

We now study properties of the degree distribution in the sparse regime, where not 
every degree is represented. First consider the location of the first "hole" in the degree 
distribution — the smallest degree value for which Nk = 0. We define hi as the degree 
value of the first hole, /12 as the degree of the second hole, etc. 

To determine the location of the first hole, it is useful to use the probability P{h) 
that there are no holes in the degree distribution within the range [l,h]. This coincides 
with the probability that there is at least one node of degree k for every k between 1 
and h. Again under the assumption that the number of nodes of degree k is given by 
an independent Poisson distribution for each k, this probability is given by 

p{h)= n [i-e-^^]- (7) 

l<k<h 

We estimate the location of the first hole from the criterion P{hi) = i; however, any 
constant between and 1 could equally well be chosen in this condition. Taking the 
logarithm of ([7]) and using In [l — e"^*^] ~ —e~^^ (which is justifiable since e"^'' ^ 1 
when k < hi), gives the following for the average location (hi) of the first hole: 

dke-^''=\n2. (8) 

1 



Using Eq. ([2]) in ([8]) we find 



l/u 



(uNRy uNR 

Since H appears inside the logarithm, one can ignore the logarithmic factor in H itself, 
thereby giving the simpler and still asymptotically exact formula 

It is worth noting that the naive calculation that leads to ([3]) for the number 
of distinct degrees ignores the possibility that holes exist in the range k < [NRY'^ . 
According to Eq. (I9b[) . however, the first hole appears earlier than {NRY'" in the 
N ^ 00 limit. For a terrestrial-scale network with, say N = 10^ nodes, the location of 
the first hole will be roughly 3 times smaller than that predicted by the naive estimate 

©• 

4. Last Doublet and Last Dimer 

Somewhere in the tail of the degree distribution lies the last doublet, the largest two 
consecutive k values for which A^^ > 0, and the last dimer, the largest k value for which 
Nk = 2 (Fig. [2]). Starting with degree 1, the degree distribution first consists of a 
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Figure 6. Location of the last hole and the last doublet as a function of TV for lO'' 
realizations of networks that are grown by strictly linear preferential attachment. The 
dashed curve is the prediction ((9a|) . while the straight dashed line is the prediction 
froniEq. p8)) . 



long string of consecutive "occupied" degrees 1 < k < hi, followed by a second string 
in the degree range hi < k < h2, etc. As the degree increases, these strings become 
progressively shorter and above a certain threshold all remaining strings are singlets. 
For a large network, the last string that is not a singlet will almost certainly be a doublet 
(with probability approaching 1 as A^ — )■ oo). We now determine the average position 
of this last doublet. 

The probability to have a doublet at {k, A; + 1) is N^ when k :^ K = {NRf/''. To 
estimate the position of the last doublet {5,5 + 1) we employ the extremal criterion 



k>5 



(10) 



that there should be of the order of one doublet in the degree range (5, oo). Using 
Nk ~ NRk-", we obtain 

{5) = C(i?Ar)V('^-i/2) ^ ^-^-^^^ 

with C a constant, for the average position of the last doublet. Notice that the position 
of the last doublet also coincides, up to a prefactor of the order of 1, to the position of 
the last dimer. A more precise approach to determine the average location of the last 
doublet gives the amphtude as 



C={2u-1 



,-l/(2l.-l)n 



(lib) 



2u- 1_ 

To establish ( jllbl) . we use the the independent Poisson approximation to write, for the 
probability F{5) to have no doublets in the degree range k > 5, 



m = n fi - (1 



-N^Y 



(12) 



k>5 
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This expression is the straightforward generahzation of Eq. ([7]) to the case of dimers. 
Since the average number of nodes with degrees in the range k > 6 is small, the product 
on the right-hand side of (IT2l) simplifies to exp [— J^ dk A^|] . Computing the integral 
gives 



F = exp[-{5o/6r~% So={^^;^] • (13) 



The probability density $ = ^ for the last doublet is then 

$W = ^(f)""exp[-(M"-^], (14) 

from which the average position of the last doublet is given by 

/'OO /•OO 

{S)= dSS^i5)= dS[l-F{S)]. (15a) 

Jo Jo 

Substituting ( tTSl) into (I15ap leads to 

which reproduces ( TTTI) . Similarly, the mean-square position of the last doublet is 

/•CO /"OO 

{6^)= d5 5^^{5) = 2 d66[l-F{6)], (16) 

Jo Jo 



from which the variance is 



is') - {Sf 



p, 2z/-3 \ _^2/2z/-2 



6',. (17) 



2u-lJ \'2iy-l. 

For strictly linear preferential attachment network growth, the above results reduce to 
{5) = AAT^/s, A = (f )i/5r(|) = 1.469158 . . . 

(^5Y/^ = V{6), y = V^p^ = 1.048182 .. . ^^^^ 

^ ' ^ '' r(4/5) 

Following the same line of reasoning, the position of the last triplet, (r — 1 , r, r + 1) , 
is given by 

(r) ~ Ari/(-i/3) _ (19) 

For strictly linear preferential attachment, this result gives the dependence (r) ~ N^^^. 
Our simulation data are consistent with this prediction. 



Distinct Degrees and Their Distribution in Complex Networks 10 

5. Discussion 

For any broadly distributed integer-valued variable, the underlying distribution exhibits 
intriguing features that stem from the combined influences of discreteness and finiteness. 
Such a distribution is smooth in a dense regime, where every integer value of the variable 
has a non-zero probability of occurrence. In the complementary sparse regime, a variety 
of statistical anomalies arise that quantify the extent of the sparseness (Fig. |2]). 

For the degree distribution of complex networks that genererically have power-law 
tails, Nk ~ N/k^ , our main results are: (i) The number of distinct degrees in a network 
of N nodes scales as N^^'^ . This generic behavior is also observed in the citation network 
of the Physical Review, (ii) The distribution in the number of distinct degrees is very 
well fit by a universal Gaussian function, (iii) There is a rich set of behaviors for 
basic characteristics of the sparse regime, such as the positions of holes (zeros) in the 
distribution, as well as the locations of doublet, triplets, etc., and the locations of dimers, 
trimers, etc. All of these quantities can be determined by simple probabilistic reasoning. 

Our analysis tacitly assumed that the number of nodes of different degrees, Ni 
and Nj for i ^ j, are uncorrelated, and that the N^s are Poisson distributed random 
quantities. While these assumptions are questionable in the sparse regime, predictions 
that are based on these assumptions are in excellent agreement with results from 
simulations of preferential attachment networks. While we believe that our predictions 
are asymptotically exact, a more rigorous analysis is needed to justify them and explain 
their validity (or at least their impressive accuracy) . A challenging extension of this work 
is to probe the fluctuations in the total number of distinct degrees. The mechanism for 
the observed Gaussian shape of the distribution of distinct degrees is not at all evident. 
In fact, for networks that grow by redirection, with redirection becoming more certain 
as the degree of the ancestor node increases, the total number of distinct degrees is not 
even a self-averaging quantity [26] . 

Our methods apply equally well to other heavy-tailed integer-valued distributions, 
such as the cluster-size distribution in classical percolation [27] and in protein interaction 
and regulatory networks [28]. The latter models often exhibit an infinite-order 
percolation transition, in which the cluster-size distribution has an algebraic tail in 
the entire non-percolating phase [29], [30|, [3ll [321 |33l IHU [351 [3H]- Our approach leads to 
new results for the total number of distinct cluster types C^, for the position of the first 
hole (the minimal size the is not present), etc. 

For concreteness, consider networks that are built by adding nodes one at a time 
with each new node connecting to k randomly chosen existing nodes with probability 
Pk [351 [M|- While the set of probabilities pk,k = 0,1,2,..., with ^fc>oPfc = 1, 
fully defines the network ensemble, only the first two moments, (k) = Ylk>i ^Pk ^'^d 
A = (k"^) — (k)"^, matter in determining large-scale properties. In the non-percolating 
phase, {k) < ^ and A < 1, we use the decay exponent for the cluster-size distribution 
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that was determined in fSB] to obtain 



Cm ~ A^ 



1 - V1-4A 
3- V1-4A 



At the percolation transition, {k) < | and A = i, the tail of the cluster-size 
distribution contains universal (independent of (k) and A) algebraic and logarithmic 
factors, viz. c^ ~ 2(1 — 2(fc))~^s~^(lns)~^ for s ^ 1. A straightforward generalization 
of our previous analysis shows that the total number of distinct cluster types grows as 



c^^2V3r(|) 



1-2(A;) 



2/3 



N 



nl/3 



(InA^)^ 



As a final note, this work has focused broadly on properties associated with the 
support of discrete distribution. The averages of these properties over a large ensemble 
of networks have systematic dependences on the number of nodes A^ in the network; 
however, the behavior in each network realization may not be monotonic. Thus while 
fcmax is clearly a non-decreasing function of A^, the number of distinct degrees and 
the locations of quantities like the first hole or the last doublet can both increase or 
decrease with A^. This intriguing aspect of the problem may provide a more detailed 
understanding of how a complex network actually grows. 
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